Skip to content

[SC-9555] Added section to inject custom context via docstring#354

Merged
juanmleng merged 12 commits intomainfrom
juan/sc-9555/add-docstring-custom-prompting-example-to-test-result-description-notebook
May 2, 2025
Merged

[SC-9555] Added section to inject custom context via docstring#354
juanmleng merged 12 commits intomainfrom
juan/sc-9555/add-docstring-custom-prompting-example-to-test-result-description-notebook

Conversation

@juanmleng
Copy link
Contributor

@juanmleng juanmleng commented Apr 23, 2025

Internal Notes for Reviewers

It’s important to note that the docstring serves as a signal to guide the LLM in understanding the test’s purpose, mechanism, strengths, and limitations. It is not interpreted as a strict instruction set, nor are its contents copied verbatim.

To ensure a specific line is included in the LLM’s output, it should be clearly formatted as an instruction. For example, the following syntax works reliably:

INSTRUCTION: Please add the following note at the end of the description:
"NOTE: This is a sample of the data, for the full data results please look in the appendix."

This explicit instruction helps ensure the LLM includes the desired text in the final description.

An example has been added to the notebook demonstrating how to append instructions to the default docstring. In this example, two instructions are provided:

  • Specify a fixed number of key insights to include.
  • Add a disclaimer note at the end of the description.
Screenshot 2025-04-22 at 14 02 58

External Release Notes

A new section titled "Add test-specific context using the docstring" has been added to the add_context_to_llm_descriptions.ipynb notebook. This section provides guidance on how users can embed explicit instructions within a test's docstring to influence LLM-generated test result descriptions.

The enhancement addresses a common issue where non-instructional context in docstrings was ignored by the LLM. Users are now advised to format specific lines as instructions using for example the following syntax for reliable inclusion:

INSTRUCTION: Please add the following note at the end of the description: "NOTE: This is a sample of the data, for the full data results please look in the appendix."

@juanmleng juanmleng added the documentation Improvements or additions to documentation label Apr 23, 2025
@juanmleng juanmleng self-assigned this Apr 23, 2025
@github-actions
Copy link
Contributor

PR Summary

This pull request enhances the Jupyter notebook add_context_to_llm_descriptions.ipynb by introducing a new section that demonstrates how to add test-specific context to LLM (Large Language Model) descriptions using docstrings. The changes include:

  1. Table of Contents Update: Added a new entry for the section on adding test-specific context using docstrings.

  2. New Markdown Section: Introduced a markdown section explaining the process of customizing test result descriptions by adding explicit instructions to the test docstring.

  3. Custom Test Implementation: Implemented a custom test MissingValues with a detailed docstring following the ValidMind docstring structure. This test evaluates dataset quality by ensuring the missing value ratio across all features does not exceed a set threshold.

  4. Post-Processing Function: Added a function add_instructions that appends custom instructions to the test docstring. This function modifies the default docstring to include additional instructions for generating key insights and appending a note to the output.

  5. Test Execution: Demonstrated the execution of the custom test with and without the post-processing function to show how the LLM-generated description is updated.

These changes aim to provide users with a method to enhance the context of test results, making them more informative and tailored to specific needs.

Test Suggestions

  • Run the notebook to ensure that the new section on adding test-specific context is correctly rendered and integrated into the table of contents.
  • Execute the MissingValues test without the post-processing function to verify that the default docstring is used.
  • Execute the MissingValues test with the add_instructions post-processing function to confirm that the docstring is correctly modified and the LLM-generated description reflects the appended instructions.
  • Check that the test results include the custom instructions and note as specified in the post-processing function.
  • Verify that the notebook runs without errors and that all code cells execute successfully.

@juanmleng juanmleng added the enhancement New feature or request label Apr 23, 2025
@github-actions
Copy link
Contributor

PR Summary

This pull request introduces two main changes:

  1. GitHub Actions Enhancement: The permissions for the release_notes_check.yaml workflow have been updated to allow write access to pull requests. This change is likely intended to enable the workflow to perform actions that require write permissions on pull requests, such as updating or commenting on PRs.

  2. Notebook Enhancements: The Jupyter notebook add_context_to_llm_descriptions.ipynb has been updated to include a new section on adding test-specific context using docstrings. This includes:

    • A new markdown section explaining how to customize test result descriptions by adding explicit instructions to the test docstring.
    • A new code cell that demonstrates the implementation of a custom test MissingValues with a detailed docstring following the ValidMind structure.
    • A post-processing function add_instructions that appends additional instructions to the test docstring, demonstrating how to modify the default docstring to include custom instructions.
    • Execution counts for various code cells have been updated, indicating that the notebook has been run to reflect these changes.

These changes enhance the functionality of the GitHub workflow and improve the documentation and usability of the notebook by providing more context and customization options for test descriptions.

Test Suggestions

  • Verify that the GitHub Actions workflow with updated permissions can successfully perform actions that require write access to pull requests.
  • Run the Jupyter notebook to ensure that the new section on adding context to LLM descriptions is correctly implemented and that the custom test MissingValues executes as expected.
  • Test the add_instructions function to confirm that it correctly appends the specified instructions to the test docstring.
  • Check that the modified notebook cells execute without errors and produce the expected outputs.

@github-actions
Copy link
Contributor

Pull requests must include a description in the release notes section.

1 similar comment
@github-actions
Copy link
Contributor

Pull requests must include a description in the release notes section.

Copy link
Collaborator

@validbeck validbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for taking so long, some suggestions and a request: Can we include a small sentence describing why modifying the docstring is different from the other methods? When would you use this over the others, for example?

juanmleng and others added 2 commits April 29, 2025 11:28
End sentences with colon

Co-authored-by: Beck <164545837+validbeck@users.noreply.github.com>
@juanmleng juanmleng requested a review from validbeck April 29, 2025 09:57
Copy link
Collaborator

@validbeck validbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Juan, that makes sense to me! I just touched up the writing a bit but it's good to go to me.

Just a quick question for my understanding, when we say:

Use this method when you want instructions to remain an intrinsic part of the test's definition, eliminating the need to repeatedly set environment variables in different execution contexts.

This means it persists across the user's local environment for when they run this particular test (past the Jupyter Notebook session) and not that it's globally set for all users in their organization, correct?

@juanmleng juanmleng requested a review from validbeck April 30, 2025 09:23
@juanmleng
Copy link
Contributor Author

Thanks Juan, that makes sense to me! I just touched up the writing a bit but it's good to go to me.

Just a quick question for my understanding, when we say:

Use this method when you want instructions to remain an intrinsic part of the test's definition, eliminating the need to repeatedly set environment variables in different execution contexts.

This means it persists across the user's local environment for when they run this particular test (past the Jupyter Notebook session) and not that it's globally set for all users in their organization, correct?

Yes, exactly, that’s correct. When we embed the instructions directly in the test’s docstring, they become part of the test source code. This ensures they persist for that specific test across different environments and sessions for the user running it, without needing to rely on external environment variables.

Thanks Juan, that makes sense to me! I just touched up the writing a bit but it's good to go to me.

Just a quick question for my understanding, when we say:

Use this method when you want instructions to remain an intrinsic part of the test's definition, eliminating the need to repeatedly set environment variables in different execution contexts.

This means it persists across the user's local environment for when they run this particular test (past the Jupyter Notebook session) and not that it's globally set for all users in their organization, correct?

Added a a section at the end of the notebook to summarise the two methods and highlight best practices with the idea of making things a bit more clear. Let me know if it is clear lol

@validbeck
Copy link
Collaborator

Added a a section at the end of the notebook to summarise the two methods and highlight best practices with the idea of making things a bit more clear. Let me know if it is clear lol

Nice! The new sections sum up the method nicely, thank you so much.

@github-actions
Copy link
Contributor

github-actions bot commented May 2, 2025

PR Summary

This pull request introduces several enhancements and modifications across different files in the project:

  1. Workflow Permissions Update:

    • The GitHub Actions workflow file .github/workflows/release_notes_check.yaml has been updated to change the permissions for pull-requests from read to write. This change allows the workflow to have write access to pull requests, which may be necessary for certain automated tasks or updates.
  2. Notebook Enhancements:

    • In the notebook notebooks/how_to/add_context_to_llm_descriptions.ipynb, new sections have been added to guide users on how to add test-specific context using docstrings. This includes detailed markdown instructions and code examples demonstrating how to customize test result descriptions by modifying docstrings.
    • Execution counts in the notebook cells have been updated, indicating that the cells have been executed and the results are now part of the notebook.
    • Additional markdown content has been added to provide best practices for adding custom context to test result descriptions.
  3. Tutorial Updates:

    • Minor text updates in tutorial notebooks (notebooks/tutorials/model_development/2-start_development_process.ipynb and notebooks/tutorials/model_development/3-integrate_custom_tests.ipynb) to clarify instructions for selecting test-driven blocks from the library.
    • A typo correction in notebooks/tutorials/model_validation/3-developing_challenger_model.ipynb to improve the readability of the description of a challenger model.
  4. Image and GIF Updates:

    • New images and GIFs have been added to the notebooks/tutorials/model_development directory to support the updated tutorial content.

These changes aim to improve the functionality and clarity of the documentation and workflows within the project.

Test Suggestions

  • Verify that the GitHub Actions workflow with updated permissions executes successfully and performs the intended write operations on pull requests.
  • Run the updated notebook add_context_to_llm_descriptions.ipynb to ensure that the new sections and code examples execute without errors and produce the expected outputs.
  • Check that the tutorial notebooks reflect the updated instructions and that users can follow them to successfully select test-driven blocks.
  • Ensure that the new images and GIFs are correctly displayed in the tutorial notebooks and enhance the understanding of the instructions.
  • Test the functionality of the MissingValues test with various datasets to confirm that the docstring modifications are correctly applied and that the test results are accurate.

@juanmleng juanmleng merged commit c991a8e into main May 2, 2025
7 checks passed
@johnwalz97 johnwalz97 deleted the juan/sc-9555/add-docstring-custom-prompting-example-to-test-result-description-notebook branch August 20, 2025 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants